Robust stochastic parsing using the inside-outside algorithm
نویسندگان
چکیده
Development of a robust syntactic parser capable of returning the unique, correct and syntactically determi: hate analysis for arbitrary naturally-occurring input will require solutions to two critical problems with most, if not all, current wide-coverage parsing systems; namely, resolution of structural ambiguity and undergeneration. Typically, resolution of syntactic ambiguity has been conceived as the problem of representing and deploying non-syntactic (semantic, pragmatic, phonological) knowledge. However, this approach has not proved fruitful so far except for small and simple domains and even in these cases remains labour intensive. In addition, some naturally-occurring sentences will not be correctly analysed (or analysed at all) by a parser deploying a generative grammar based on the assumption that the grammatical sentences of a natural language constitute a wellformed set (e.g. Sampson, 1987a,b; Taylor et al., 1989). Little attention has been devoted to this latter problem; however, the increasing quantities of machine-readable text requiring linguistic classification both for purposes of research and information retrieval, make it increasingly topical. In this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, stochastic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed (e.g. Sampson et ai., 1989; Garside & Leech, 1985; Magerman & Marcus, 1991a,b), these appear most promising as they are computationallytractable (in principle) and well-integrated with formal language / automata theory. The Viterbi algorithm and Baum-Welch algorithm are optimised algorithms (with polynomial computational complexity) which can be used in conjunction with stochastic regular grammars (finite-state automata, i.e. (hidden) markov models, Banm, 1972) and with stochastic context-free grammars (Baker, 1982; Fujisaki et al., 1989) to select the most probable analysis of a sentence and to (re-)estimate the probabilities of the rules (nonzero parameters) defined by the grammar (respectively). The Viterbi algorithm computes the maximally probable derivation with polynomial resources despite the exponential space of possible derivations (e.g. Church & Patil, 1983) by exploiting the stochastic assumption and pruning all non-maximal paths leading to the set of states / non-terminals compatible with the input at each step in the parsing process. The Baum=Welch algorithm (which is often called the forward-backward algorithm when applied to regular grammars and the insideoutside algorithm with context-free grammars) computes the probability of each possible derivation with polynomial resources also by factoring the computation across each state / non-terminal involved in any derivation. A detailed and clear description of these algorithms is provided by de Rose (1988), Holmes (1988) and Lari Young (1990), amongst others. These algorithms will converge towards a local optimum when used to iteratively re-estimate probabilities on a training corpus in a manner which maximises the likelihood of the training corpus given the grammar.
منابع مشابه
Parsing Inside-Out
Probabilistic Context-Free Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given non-terminal covers any piece of the input sentence. The traditional use of the...
متن کاملReestimation and Best-First Parsing Algorithm for Probabilistic Dependency Grammars
This paper presents a reesthnation algorithm and a best-first parsing (BFP) algorithm for probabilistic dependency grummars (PDG). The proposed reestimation algorithm is a variation of the inside-outside algorithm adapted to probabilistic dependency grammars. The inside-outside algorithm is a probabilistic parameter reestimation algorithm for phrase structure grammars in Chomsky Normal Form (CN...
متن کاملIterative CKY Parsing for Probabilistic Context-Free Grammars
This paper presents an iterative CKY parsing algorithm for probabilistic contextfree grammars (PCFG). This algorithm enables us to prune unnecessary edges produced during parsing, which results in more efficient parsing. Since pruning is done by using the edge’s inside Viterbi probability and the upper-bound of the outside Viterbi probability, this algorithm guarantees to output the exact Viter...
متن کاملParsing the Wall Street Journal with the Inside-Outside Algorithm
We report grammar inference experiments on partially parsed sentences taken from the Wall Street Journal corpus using the inside-outside algorithm for stochastic context-free grammars. The initial grammar for the inference process makes no ,assumption of the kinds of structures and their distributions. The inferred grammar is evaluated by its predicting power and by comparing the bracketing of ...
متن کاملComparison Between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars
The most popular algorithms for the estimation of the probabilities of a context-free grammar are the Inside-Outside algorithm and the Viterbi algorithm, which are Maximum Likelihood approaches. The diierence between the logarithm of the likelihood of a string and the logarithm of the likelihood of the most probable parse of a string is upper bounded linearly by the length of the string and the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/cmp-lg/9412006 شماره
صفحات -
تاریخ انتشار 1994